Goto

Collaborating Authors

 monte carlo dropout


Quantifying Uncertainty in Machine Learning-Based Pervasive Systems: Application to Human Activity Recognition

Balditsyn, Vladimir, Lalanda, Philippe, Vega, German, Chollet, Stéphanie

arXiv.org Artificial Intelligence

The recent convergence of pervasive computing and machine learning has given rise to numerous services, impacting almost all areas of economic and social activity. However, the use of AI techniques precludes certain standard software development practices, which emphasize rigorous testing to ensure the elimination of all bugs and adherence to well-defined specifications. ML models are trained on numerous high-dimensional examples rather than being manually coded. Consequently, the boundaries of their operating range are uncertain, and they cannot guarantee absolute error-free performance. In this paper, we propose to quantify uncertainty in ML-based systems. To achieve this, we propose to adapt and jointly utilize a set of selected techniques to evaluate the relevance of model predictions at runtime. We apply and evaluate these proposals in the highly heterogeneous and evolving domain of Human Activity Recognition (HAR). The results presented demonstrate the relevance of the approach, and we discuss in detail the assistance provided to domain experts.


Enhancing Multi-Label Thoracic Disease Diagnosis with Deep Ensemble-Based Uncertainty Quantification

Laksara, Yasiru, Thayasivam, Uthayasanker

arXiv.org Artificial Intelligence

Abstract--The utility of deep learning models, such as CheXNet, in high stakes clinical settings is fundamentally constrained by their purely deterministic nature, failing to provide reliable measures of predictive confidence. Initial architectural development failed to stabilize performance and calibration using Monte Carlo Dropout (MCD), yielding an unacceptable Expected Calibration Error (ECE) of 0.7588. This technical failure necessitated a rigorous architectural pivot to a high diversity, 9-member Deep Ensemble (DE). This resulting DE successfully stabilized performance and delivered superior reliability, achieving a State-of-the-Art (SOT A) average Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.8559 and an average F1 Score of 0.3857. Crucially, the DE demonstrated superior calibration (Mean ECE of 0.0728 and Negative Log-Likelihood (NLL) of 0.1916) and enabled the reliable decomposition of total uncertainty into its Aleatoric (irreducible data noise) and Epistemic (reducible model knowledge) components, with a mean Epistemic Uncertainty (EU) of 0.0240. These results establish the Deep Ensemble as a trustworthy and explainable platform, transforming the model from a probabilistic tool into a reliable clinical decision support system. Deep learning has achieved remarkable diagnostic accuracy in medical imaging, exemplified by the CheXNet model, which reached radiologist-level performance for pneumonia detection and classification across 14 thoracic pathologies.


Real-Time LiDAR Super-Resolution via Frequency-Aware Multi-Scale Fusion

Goo, June Moh, Zeng, Zichao, Boehm, Jan

arXiv.org Artificial Intelligence

Abstract-- LiDAR super-resolution addresses the challenge of achieving high-quality 3D perception from cost-effective, low-resolution sensors. While recent transformer-based approaches like TULIP show promise, they remain limited to spatial-domain processing with restricted receptive fields. We introduce FLASH (Frequency-aware LiDAR Adaptive Super-resolution with Hierarchical fusion), a novel framework that overcomes these limitations through dual-domain processing. FLASH integrates two key innovations: (i) Frequency-A ware Window Attention that combines local spatial attention with global frequency-domain analysis via FFT, capturing both fine-grained geometry and periodic scanning patterns at log-linear complexity. Extensive experiments on KITTI demonstrate that FLASH achieves state-of-the-art performance across all evaluation metrics, surpassing even uncertainty-enhanced baselines that require multiple forward passes. Notably, FLASH outperforms TULIP with Monte Carlo Dropout while maintaining single-pass efficiency, which enables real-time deployment. The consistent superiority across all distance ranges validates that our dual-domain approach effectively handles uncertainty through architectural design rather than computationally expensive stochastic inference, making it practical for autonomous systems. The high cost of high-resolution LiDAR sensors presents a fundamental challenge for autonomous systems.


CLIN-LLM: A Safety-Constrained Hybrid Framework for Clinical Diagnosis and Treatment Generation

Hasan, Md. Mehedi, Mostafiz, Rafid, Hossain, Md. Abir, Paul, Bikash Kumar

arXiv.org Artificial Intelligence

Accurate symptom-to-disease classification and clinically grounded treatment recommendations remain challenging, particularly in heterogeneous patient settings with high diagnostic risk. Existing large language model (LLM)-based systems often lack medical grounding and fail to quantify uncertainty, resulting in unsafe outputs. We propose CLIN-LLM, a safety-constrained hybrid pipeline that integrates multimodal patient encoding, uncertainty-calibrated disease classification, and retrieval-augmented treatment generation. The framework fine-tunes BioBERT on 1,200 clinical cases from the Symptom2Disease dataset and incorporates Focal Loss with Monte Carlo Dropout to enable confidence-aware predictions from free-text symptoms and structured vitals. Low-certainty cases (18%) are automatically flagged for expert review, ensuring human oversight. For treatment generation, CLIN-LLM employs Biomedical Sentence-BERT to retrieve top-k relevant dialogues from the 260,000-sample MedDialog corpus. The retrieved evidence and patient context are fed into a fine-tuned FLAN-T5 model for personalized treatment generation, followed by post-processing with RxNorm for antibiotic stewardship and drug-drug interaction (DDI) screening. CLIN-LLM achieves 98% accuracy and F1 score, outperforming ClinicalBERT by 7.1% (p < 0.001), with 78% top-5 retrieval precision and a clinician-rated validity of 4.2 out of 5. Unsafe antibiotic suggestions are reduced by 67% compared to GPT-5. These results demonstrate CLIN-LLM's robustness, interpretability, and clinical safety alignment. The proposed system provides a deployable, human-in-the-loop decision support framework for resource-limited healthcare environments. Future work includes integrating imaging and lab data, multilingual extensions, and clinical trial validation.


Confidence Calibration in Large Language Model-Based Entity Matching

Kamsteeg, Iris, Cardenas-Cartagena, Juan, van Beers, Floris, Holt, Gineke ten, Tashu, Tsegaye Misikir, Valdenegro-Toro, Matias

arXiv.org Artificial Intelligence

This research aims to explore the intersection of Large Language Models and confidence calibration in Entity Matching. To this end, we perform an empirical study to compare baseline RoBERTa confidences for an Entity Matching task against confidences that are calibrated using Temperature Scaling, Monte Carlo Dropout and Ensembles. We use the Abt-Buy, DBLP-ACM, iTunes-Amazon and Company datasets. The findings indicate that the proposed modified RoBERTa model exhibits a slight overconfidence, with Expected Calibration Error scores ranging from 0.0043 to 0.0552 across datasets. We find that this overconfidence can be mitigated using Temperature Scaling, reducing Expected Calibration Error scores by up to 23.83%.


Rethinking Inter-LoRA Orthogonality in Adapter Merging: Insights from Orthogonal Monte Carlo Dropout

Zhang, Andi, Ding, Xuan, Wang, Haofan, McDonagh, Steven, Kaski, Samuel

arXiv.org Artificial Intelligence

We propose Orthogonal Monte Carlo Dropout, a mechanism that enforces strict orthogonality when combining sparse semantic vectors without extra time complexity. Low-Rank Adaptation (LoRA), a popular fine-tuning method for large models, typically trains a module to represent a specific concept such as an object or a style. When multiple LoRA modules are merged, for example to generate an object in a particular style, their outputs (semantic vectors) may interfere with each other. Our method guarantees that merged LoRA modules remain orthogonal and thus free from direct interference. However, empirical analysis reveals that such orthogonality does not lead to the semantic disentanglement highlighted in prior work on compositional adaptation. This finding suggests that inter-LoRA orthogonality alone may be insufficient for achieving true semantic compositionality, prompting a re-examination of its role in adapter merging.


Dynamic Meta-Learning for Adaptive XGBoost-Neural Ensembles

Sedek, Arthur

arXiv.org Artificial Intelligence

Abstract--This paper introduces a novel adaptive ensemble framework that synergistically combines XGBoost and neural networks through sophisticated meta-learning. The proposed method leverages advanced uncertainty quantification techniques and feature importance integration to dynamically orchestrate model selection and combination. Experimental results demonstrate superior predictive performance and enhanced interpretability across diverse datasets, contributing to the development of more intelligent and flexible machine learning systems. In the rapidly evolving landscape of machine learning, the quest for models that can adapt to complex, heterogeneous data while maintaining high predictive accuracy remains a significant challenge. T raditional approaches often rely on either tree-based methods, such as XGBoost [1], known for their effectiveness with tabular data, or neural networks [2], [14], celebrated for their ability to capture intricate patterns [15].


Data-Efficient ASR Personalization for Non-Normative Speech Using an Uncertainty-Based Phoneme Difficulty Score for Guided Sampling

Pokel, Niclas, Moure, Pehuén, Boehringer, Roman, Gao, Yingqiang

arXiv.org Artificial Intelligence

Automatic speech recognition (ASR) systems struggle with non-normative speech from individuals with impairments caused by conditions like cerebral palsy or structural anomalies. The high acoustic variability and scarcity of training data severely degrade model performance. This work introduces a data-efficient personalization method that quantifies phoneme-level uncertainty to guide fine-tuning. We leverage Monte Carlo Dropout to estimate which phonemes a model finds most difficult and use these estimates for a targeted oversampling strategy. We validate our method on English and German datasets. Crucially, we demonstrate that our model-derived uncertainty strongly correlates with phonemes identified as challenging in an expert clinical logopedic report, marking, to our knowledge, the first work to successfully align model uncertainty with expert assessment of speech difficulty. Our results show that this clinically-validated, uncertainty-guided sampling significantly improves ASR accuracy, delivering a practical framework for personalized and inclusive ASR.


A Quad-Step Approach to Uncertainty-Aware Deep Learning for Skin Cancer Classification

Asgharnezhad, Hamzeh, Tabarisaadi, Pegah, Khosravi, Abbas, Alizadehsani, Roohallah, Acharya, U. Rajendra

arXiv.org Artificial Intelligence

Accurate skin cancer diagnosis is vital for early treatment and improved patient outcomes. Deep learning (DL) models have shown promise in automating skin cancer classification, yet challenges remain due to data scarcity and limited uncertainty awareness. This study presents a comprehensive evaluation of DL-based skin lesion classification with transfer learning and uncertainty quantification (UQ) on the HAM10000 dataset. We benchmark several pre-trained feature extractors -- including CLIP variants, ResNet50, DenseNet121, VGG16, and EfficientNet-V2-Large -- combined with traditional classifiers such as SVM, XGBoost, and logistic regression. Multiple principal component analysis (PCA) settings (64, 128, 256, 512) are explored, with LAION CLIP ViT-H/14 and ViT-L/14 at PCA-256 achieving the strongest baseline results. In the UQ phase, Monte Carlo Dropout (MCD), Ensemble, and Ensemble Monte Carlo Dropout (EMCD) are applied and evaluated using uncertainty-aware metrics (UAcc, USen, USpe, UPre). Ensemble methods with PCA-256 provide the best balance between accuracy and reliability. Further improvements are obtained through feature fusion of top-performing extractors at PCA-256. Finally, we propose a feature-fusion based model trained with a predictive entropy (PE) loss function, which outperforms all prior configurations across both standard and uncertainty-aware evaluations, advancing trustworthy DL-based skin cancer diagnosis.


Surrogate Modelling of Proton Dose with Monte Carlo Dropout Uncertainty Quantification

Pim, Aaron, Pryer, Tristan

arXiv.org Machine Learning

Accurate proton dose calculation using Monte Carlo (MC) is computationally demanding in workflows like robust optimisation, adaptive replanning, and probabilistic inference, which require repeated evaluations. To address this, we develop a neural surrogate that integrates Monte Carlo dropout to provide fast, differentiable dose predictions along with voxelwise predictive uncertainty. The method is validated through a series of experiments, starting with a one-dimensional analytic benchmark that establishes accuracy, convergence, and variance decomposition. Two-dimensional bone-water phantoms, generated using TOPAS Geant4, demonstrate the method's behavior under domain heterogeneity and beam uncertainty, while a three-dimensional water phantom confirms scalability for volumetric dose prediction. Across these settings, we separate epistemic (model) from parametric (input) contributions, showing that epistemic variance increases under distribution shift, while parametric variance dominates at material boundaries. The approach achieves significant speedups over MC while retaining uncertainty information, making it suitable for integration into robust planning, adaptive workflows, and uncertainty-aware optimisation in proton therapy.